专利摘要:
Embodiments of the present invention include a memory unit and a processor connected to a memory unit. The processor may be used to group a plurality of subsets of data from an input data stream and to calculate a first hash value corresponding to a first grouped subset of data. In addition, the processor may be used to detect a match between the first hash value and a second hash value stored in a hash table. In addition, the processor is also configured to monitor a hash value match frequency of the input data stream, the processor being operable to increment a counter value in response to a match detection and to determine a level of match. entropy for the input data stream based on the counter value with respect to a frequent hash value match threshold. The processor may generate an instruction for either initiating the execution of a data compression operation when the counter value reaches or exceeds the frequent hash value matching threshold, or abstaining from executing the data compression operation. Data compression operation when the counter value fails to reach the frequent hash value matching threshold.
公开号:FR3037677A1
申请号:FR1655618
申请日:2016-06-16
公开日:2016-12-23
发明作者:Ashwin Narasimha;Ashish Singhai;Vijay Karamcheti;Krishanth Skandakumaran
申请人:HGST Netherlands BV;
IPC主号:
专利说明:

[0001] FIELD OF THE INVENTION This invention relates generally to the field of data reduction technology. BACKGROUND OF THE INVENTION [1] The present invention relates generally to the field of data reduction technology.
[0002] BACKGROUND OF THE INVENTION [2] The memory subsystems of the non-volatile, high performance category generally consist of relatively expensive components. It is therefore highly desirable to maximize the storage of data in such systems by employing data reduction techniques. Data reduction refers to the techniques of self-compression of data and deduplication of data to reduce the total amount of information that is written to or read from a primary storage system. Reducing the data results in the transformation of the (input) data of the user into a more compact representation that can be stored. The benefits of data reduction include improved storage utilization, increased service life (in the context of a fully flash storage system), and application acceleration, among other benefits. 1003] Data compression refers to the process of finding redundancy within the same block of data and then coding these repeated sequences in such a way as to reduce the overall size of the data. Data deduplication refers to the process of matching data sequences between multiple blocks in an effort to find matched sequences, even if the individual block includes incompressible data. Conventional systems, however, do 3037677 compression and data deduplication as discrete steps within the data reduction process. Indeed, these conventional systems do not combine them in one step and therefore pay latency and bandwidth penalties. [004] In addition, conventional data reduction solutions require many cycles and a lot of power to perform the compression functions. In any data flow of an application, there is still a high probability that a particular set of data blocks does not exhibit self-compression properties. At the end of a compression step, conventional solutions typically perform a check to ensure that the result is not larger than the original block, which is rather late, since the resources have already been used in the attempt to compress the data. SUMMARY OF THE INVENTION [5] Therefore, there is a need for a solution that creates a unified data path that performs both data compression and deduplication in a single pass. Embodiments of the present invention combine data compression technologies and extend them by integrating them with data deduplication methods. The one-pass nature of the embodiments of the present invention allows the latencies of the system to be controlled and assists in the compression and deduplication of on-line throughput at higher speeds (eg, in a manner PCIe Gen3 speeds for a given FPGA, or other requirements or speed standards). [6] Embodiments of the present invention employ smaller subsets of data, such as 4 kilobyte data blocks, for compression and can override compression coding copy formats to differentiate a self-referenced copy of a copy of a reference block. It should be appreciated that the embodiments are not limited to 4 kilobyte data blocks and that any block size or block size range can be used (e.g. 4K, 8K, 10K, block size range from 4 KB to 8 KB, etc.). Embodiments can create memory buffer structures that have multiple parallel input buffers to hold the reference data blocks. Also, embodiments may include a parallel hash table lookup scheme in which searches corresponding to the data stored in the reference data block buffers can be performed concurrently with the hash lookups performed for the data stored in the the input data buffers. In addition, the embodiments may use the refill time of the reference data buffers to compute and store the sliced hash function values of the reference data for the purpose of improving the reduction performance of the reference data buffers. data. Embodiments may also create a mutual lock between the calculations of the reference hash table and the start of compression. Thus, when the compression starts, searches can be performed in the reference hash table, in a compression hash table, or in both. Embodiments of the present invention may use heuristics to determine which sequence to use (if any) when hash access is detected in one or more of the hash tables. In addition, the embodiments of the present invention may modify the interpretation of the earlier reference for either the input data stream or the input reference buffer. [008] In addition, the embodiments of the present invention can detect very early and predict the compressibility of the blocks in order to minimize unnecessary efforts and avoid a decline in overall system performance. The embodiments described herein can analyze the characteristic compressibility characteristics in order to make a decision to perform data reduction procedures, such as compression, on a given data block. Low-impact, high-performance entropy detection operations can thus be performed in a manner that allows a high-performance data reduction system to save power and compression unit cycles as data becomes available. incompressible are provided. BRIEF DESCRIPTION OF THE DRAWINGS [009] The accompanying drawings, which are included in and form part of this specification, and in which like numerals designate like elements, illustrate embodiments of the present invention and, together with the description , used to explain the principles of the invention. FIG. 1A is a block diagram showing an example hardware configuration of an on-line compression and deduplication system capable of performing dual parallel compression and deduplication procedures for data reduction purposes in accordance with FIGS. embodiments of the present invention. Figure 1B is a block diagram showing exemplary components provided in the memory for performing on-line compression and deduplication procedures in accordance with embodiments of the present invention. Fig. 1C shows an example of a compressed format for framing the data generated in accordance with embodiments of the present invention. Figure 1D shows an example of a combined reference hash table and compression hash table consultation scheme according to embodiments of the present invention. Figure 2A is a process diagram of a first portion of an exemplary process for one-pass entropy detection in accordance with embodiments of the present invention. Figure 2B is a process diagram of a second portion of an exemplary process for one-pass entropy detection in accordance with embodiments of the present invention. Figure 3A is a process diagram of an example of a simultaneous data deduplication and compression process in accordance with embodiments of the present invention. Fig. 3B is a process diagram of an exemplary process for performing hash table lookup procedures in accordance with embodiments of the present invention.
[0003] DETAILED DESCRIPTION [0018] Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Although the invention will be described in conjunction with the preferred embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover variants, modifications and equivalents which may be included in the spirit and scope of the invention as defined by the appended claims. In addition, in the following detailed description of the embodiments of the present invention, many specific details are set forth in order to provide a thorough understanding of the present invention. However, one skilled in the art will recognize that the present invention can be implemented without these specific details. In other cases, well known methods, procedures, components and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention. Although a method may be described as a sequence of numbered steps for clarity, the numbering does not necessarily dictate the order of the steps. It should be understood that some of the steps may be skipped, executed in parallel or executed without it being absolutely necessary to maintain a strict sequence order. The drawings which represent the embodiments of the invention are semi-schematic and are not to scale, and some dimensions are particularly indicated for the sake of clarity of presentation and are shown exaggerated in the figures of the drawing. Likewise, although the views in the drawings generally represent similar orientations to facilitate the description, this representation in the Figures is for the most part arbitrary. In general, the invention can operate in any orientation. NOTATION AND NOMENCLATURE: It should be borne in mind, however, that all such terms as well as similar terms are to be associated with the appropriate physical quantities and are only convenient labels applied to these quantities. Unless otherwise specifically indicated in the discussion below, it is considered that in the context of the present invention discussions using terms such as "reception" or "selection" or "generation" or "grouping" or "monitoring" or the like refer to the action and processes of a computer system or similar electronic computing device that manipulates and transforms data represented as physical (e.g., electronic) quantities within the registers and memories of the computer system and other computer-readable media to other data similarly represented as physical quantities within the memories or registers of the computer system or other storage devices; transmission or display of similar information. When a component appears in more than one embodiment, the use of the same reference number means that the component is the same component as that shown in the original embodiment.
[0004] EXAMPLE OF ONLINE COMPRESSION AND DEDUPLICATION SYSTEM CONFIGURATION [0022] FIG. 1A is a block diagram showing an example hardware configuration of an on-line compression and deduplication system (e.g. system 100) capable of performing dual duplicate compression and deduplication procedures for data reduction purposes in accordance with embodiments of the present invention. In this way, the system 100 can perform data reduction procedures in a single pass so that operations related to the data reduction operations, such as data compression and data deduplication, are combined into one. process, a single process path, or a single step, reducing overall system latency and / or bandwidth penalties. Although specific components are disclosed in Figure 1A, it should be appreciated that said components are exemplified.
[0005] This means that the embodiments of the present invention are well suited to have different other hardware components or variants of the components mentioned in Figure 1A. It is appreciated that the hardware components in Figure 1A may operate with components other than those shown and that not all of the hardware components described in Figure 1A are necessary to achieve the objectives of the present invention. In accordance with some embodiments, the components shown in Figure 1A may be combined to achieve the objects of the present invention. The system 100 may be implemented in the form of an electronic device 15 capable of communicating with other electronic devices through a data communications bus. The bus 106, for example, represents such a data communications bus. The exemplary system 100 upon which embodiments of the present invention can be implemented includes a general purpose computer system environment. In its simplest configuration, the system 100 usually includes at least one processing unit 101 and a memory storage unit. The computer readable storage medium 104, for example, represents such a storage storage unit. Depending on the exact configuration and type of device, the computer-readable storage medium 104 may be volatile (such as RAM), nonvolatile (such as ROM, flash memory) or any combination of both.
[0006] Portions of the computer readable storage medium 104, when executed, facilitate efficient execution of memory operations or queries for groups of elementary tasks.  In one embodiment, the processor 101 may be a programmable circuit configured to perform the on-line compression and deduplication operations described herein.  The processor 101 may, for example, be an FPGA controller or a flash memory device controller.  Alternatively, in one embodiment, the processor 101 may be used to execute an online compression and deduplication program stored in the computer readable storage medium 104 and configured to perform the functions described herein (see, for example, Figure 1B described later).  The system 100 may also include an optional graphics system 105 for presenting information to the user of the computer, for example by displaying information on an optional display device 102.  The system 100 also includes an optional alphanumeric input / output device 103.  The input / output device 103 may include an optional cursor control or guidance device and one or more signal communication interfaces, such as a network interface card.  In addition, the interface module 115 includes functionality to allow the system 100 to communicate with other computer systems through an electronic communications network (e.g. the Internet, wired communication networks, wireless communication networks or similar networks).  In addition, the system 100 may also have additional features and features.  The system 100 may, for example, also include additional storage media (removable and / or non-removable), including, but not limited to, magnetic or optical disks or tapes.  Computer storage media include volatile and nonvolatile, removable, and non-removable media used in any information storage method or technology such as computer readable instructions, data structures, memory modules, and the like. program or other data.  Fig. 113 is a block diagram showing exemplary components provided in the memory for performing online compression and deduplication procedures in accordance with embodiments of the present invention.  Although specific components are disclosed in Figure 1B, it should be appreciated that said computer storage medium components are exemplified.  This means that the embodiments of the present invention are well suited to have different other hardware components or variants of the computer storage medium components mentioned in Figure 1B.  It is appreciated that the components in Figure 1B can operate with components other than those shown, and that all of the computer storage medium components described in Figure 1B are not necessary to achieve the objects of the present invention.  In accordance with some embodiments, the components described in Figure 1B may be combined to achieve the objects of the present invention.  In addition, it is appreciated that certain hardware components described in Figure 1A may work in combination with certain components described in Figure 1B to achieve the objects of the present invention.  As shown in Figure 1B, the computer readable storage medium 104 includes an operating system 107.  The operating system 107 loads the processor 101 when the system 100 is initialized.  Similarly, when executed by the processor 101, the operating system 107 may be configured to provide a programming interface to the system 100.  The system 100 may also include wireless communication mechanisms.  Through such devices, the system 100 may be communicatively connected to other computer systems through a communication network such as the Internet or an intranet, such as a local area network.  In addition, as illustrated in FIG. 1B, the computer readable storage medium 104 comprises a computerized fingerprint calculation engine 110.  The computerized fingerprint calculation engine 110 includes fingerprint generation functionality using a sequence of bytes to perform authentication and / or consultation procedures.  Upon detecting receipt of a data stream, the buffer management controller 112 may communicate signals to the computerized fingerprint calculation engine 110 to process the data stored in the data input buffer 112. 1 at their reception.  [0029] The fingerprints generated by the computerized fingerprint calculation engine 110 can be used to represent larger files while using a fraction of the storage space that would otherwise be needed for storing such larger files. great.  Larger files may include, for example, content pages or media files.  The computerized fingerprint calculation engine 110 may employ conventional computer-implemented procedures, such as hash functions, to reduce the data flows into data bits to generate fingerprints, so that they can be processed by the system components 100 as a computerized signature computation engine 3037677 12 113.  The computerized hashing calculations can be performed in a manner consistent with the manner in which the other system components 100 calculate the hash values, such as a hash table module 111 or in a different manner.  In this manner, the computerized fingerprint calculation engine 110 may be configured to generate fingerprints for a subset of incoming data associated with a data stream while it is received by the system 100 .  The subset of data may be in the form of increments of 4 kilobytes, for example.  In one embodiment, the computerized fingerprint calculation engine 110 can calculate the fingerprints for an incoming 4 kilobyte set associated with a data stream received by the system 100 and stored in the input buffer. data 112-1 generated by the buffer management controller 112.  The computerized signature calculation engine 113 includes the signature calculation functionality for the data streams received by the system 100.  The signatures may be calculated by the computerized signature computation engine 113 based on various conventional hash-based signature schemes, including Merkle, Spooky, CRC, MD5, SHA or similar schemes.  The computerized signature calculation engine 113 may be configured to perform computerized signature calculations using computerized sub-block signature calculations, computerized similarity detection calculations based on Rabin's signature and / or other computerized signature calculations based on the similarity on the data streams received by the system 100.  According to one embodiment, the computerized signature calculation engine 113 can use the fingerprint data generated by the computerized fingerprint calculation engine 110 to generate signatures.  In one embodiment, upon receipt of a data stream, the buffer management controller 112 may be configured to communicate signals to the computerized signature calculation engine 113 to process the data stored in the buffer. data entry 112-1 upon receipt.  The computerized signature computation engine 113 may be configured to compute multiple signatures for subsets of data at a time for different portions of an input data stream.  In this manner, the signatures computed by the computerized signature calculation engine 113 for the subsets can be communicated to other system components 100 for further processing, such as a reference block identification module 114.  The signatures computed by the computerized signature computation engine 113 may include, for example, mathematical properties that allow them to be similar or identical to the case in which they are calculated on blocks that are similar or identical to each other. .  As such, a reference block selected by system components 100, such as a reference block identification module 114, may be based on a computed signature that best represents a plurality of similar signature clusters stored in a memory resident on the system 100.  Therefore, the system components 100 can perform reference block identification procedures using the signatures calculated by the computerized signature calculation engine 113.  The reference block identification module 114 may use, for example, sub-block signatures to perform the reference block identification procedures.  The reference block identification module 114 comprises the analysis function of a plurality of different signature clusters generated by the computerized signature calculation engine 113 and selection of the reference blocks which may be 3037677 14 processed by the system components 100, such as the hash table module 111.  The reference block identification module 114 may be configured to compare the calculated signatures with the signature clusters currently stored by the system 100 and accordingly select a reference block that best represents the calculated signature.  The reference block identification module 114 may be configured, for example, to compare the calculated signatures with the signature clusters currently stored in a buffer generated by the buffer management controller 112 and accordingly select a reference block which represents the better the calculated signature.  The reference blocks selected by the reference block identification module 114 may be stored within the buffers generated by the buffer management controller 112, as a reference block buffer 112-3. for further processing by system components 100.  The reference blocks may be normal data blocks that have been found to be similar to the input data by various methods.  The reference blocks may, for example, be normal data blocks that have been found to be similar to the input data using calculated sub-block signatures, similarity detection mechanisms, index detection schemes, and the like. application or similar schemes.  The reference blocks may also be purely synthetic blocks containing repetitive data sequences that have been found to have higher repetition factors.  In accordance with one embodiment, the reference block identification module 114 may be configured to identify the reference blocks using the prior knowledge, content similarity matching, application indices, pattern recognition, and the like. data or similar means.  In addition, information about the reference blocks, such as a reference block stored within the reference block buffer 112-3 identified by the reference block identification module 114, can be used. stored in the header portion of a data stream.  Referring to FIG. 1C, for example, the reference block identifier for a reference block identified by the reference block identification module 114 may be stored in the header portion 116a of the reference block ID. data 116.  As illustrated in Figure 1C, header data 116a may be included within a set of data grains, such as data grains 116-1, 116-2, and 116-N, together with their data. portions of respective compressed payload data, such as the compressed payload 116b.  In one embodiment, the header data 116a may store a reference identifier 117-1 in addition to the binary vector 117-2, the number of grains 117-3 and / or the CRC data of head 117-4.  Referring to Figure 1B, the hash table module 111 includes the hash value calculation and hash table dynamic generation functionality based on the data associated with the data streams received by the hash table module. system 100.  Upon receipt of a data stream, the buffer management controller 112 may communicate signals to the hash table module 111 to process the data stored in the data input buffer 112-1 and / or the buffer. reference blocks 112-3 at each reception of data by the buffer.  The hash table module 111 includes the hash value calculation functionality for subsets of data, such as data bytes, associated with a data stream received by the system 100 that can be stored within a hash table generated.  The hash table module 111 may, for example, calculate the hash values for the data bytes associated with a data stream received by the system 100.  As such, the hash table module 111 can be used by the most common high-performance compression schemes in a manner that speeds up the search for repeated data sequences.  The hash table module 111 can be used, for example, by the most popular high performance compression schemes including Snappy, Lempel-Ziv (LZ), Gzip compression schemes or similar schemes.  [0037] Data subsets may have a predetermined fixed size and may be used to represent larger files for performing deduplication procedures.  The hash table module 111 can thus calculate a hash value for each byte of data received by the system 100.  In this way, the hash table module 111 can calculate the hash values for the subsets of data simultaneously with their reception and storage within a buffer generated by the buffer management controller 112.  In addition, computerized hashing calculations can be performed in a manner consistent with the manner in which other system components 100 calculate the hash values, such as the computerized fingerprint calculation engine 110, or in a manner similar to that described in FIG. different.  [0038] In accordance with one embodiment, the hash table module 111 includes the dynamic generation feature of the reference hash table based on the data reference blocks identified by the block identification module. reference 130.  Once selected by the reference block identification module 114, the data blocks corresponding to the reference blocks can be stored with them in a reference block buffer, such as the reference block buffer 112-3.  As the reference blocks are stored, the hash table module 111 may be configured to calculate the sliced hash values that correspond to the reference blocks.  Thus, the hash table module 111 can generate pre-computed hash tables that can speed up the performance of the compression and deduplication procedures performed by the system 100.  Referring to Figure 1B, for example, when a set of bytes is received by the system 100 and stored in the data input buffer 112-1 resident on the system 100, the hash table 111 may calculate the hash values for the reference blocks determined and / or selected by the reference block identification module 114 as corresponding to the received byte set.  The hash table module 111 calculates these hash values as the data reference blocks are stored in the reference data block buffer 112-3, which has been dynamically generated by the buffer management controller 112.  In this manner, the buffer management controller 112 includes the feature of creating reference data block buffers which may be parallel to the functionality of the resident data input buffers on the system 100, such as the data buffer. data entry 112-1.  In this form, these calculated reference block hash values can then be stored in the reference hash table 111-1 generated by the hash table module 111.  The hash table module 111 includes the dynamic generation feature of compression hash tables using a data stream received by the system 100 and / or stored in the data input buffers.  In addition, the hash table module 111 includes the coded data modification and / or generation feature that can be used to decompress and / or then reconstruct the data streams previously processed by the system 100.  In this manner, the hash table module 111 may be configured to modify and / or encode the header data when identifying similar data sequences during the compression operations.  The hash table module 111 can thus generate coded data containing a reference identifier that corresponds to the stored data previously identified by the hash table module 111.  The hash table module 111 may, for example, generate and / or modify coded header data which contains the number of uncompressed data bytes identified by the hash table module 111. as the number of figurative constants identified, upon completion of the computerized hash calculation procedures.  In this manner, the coded data generated by the hash table module 111 can provide instructions as to how the decompression module can decompress or decode a figurative constant and / or copy elements that correspond to a set of bytes. associated with a data flow subject to decompression procedures.  The copied items can include bytes to copy ("length") and / or how far back are the data to be copied ("offset").  In one embodiment, for example, the header data generated and / or modified by the hash table module 111 may include a representation of the identified figurative constants and a corresponding figurative constant data sequence. .  As such, the decompression module 108 can read the coded and / or modified header information that provides instructions on how the module can decompress the sequence of figurative constants.  In addition, the decompression module 108 may be configured to perform the decompression procedures based on various compression schemes such as Snappy, LZ, Gzip compression schemes or similar schemes.  In accordance with one embodiment, provided that at least one reference block is selected and designated to be stored in a reference block buffer, the hash table module 111 may send signals to the system components. 100 to perform hash table lookup and / or header modification procedures using the reference hash table and / or compression hash table for further processing based on the values calculated hash.  In this manner, the hash table module 111 can create a mutual interlock between the computerized calculations of the reference hash table and the start of the decompression procedures.  In addition, the computer hash calculation procedures performed by the hash table module 111 for the compression hash table and the reference hash table may be the same computer-implemented procedures or functions or procedures or functions. implemented by 15 different computers.  Table I contains an example set of header formats or modifications of the encoding format with an earlier reference capable of being modified by embodiments of the present invention.  Compressed header Meaning 00 Figurative constant, max.  60 bytes 01 Local copy, 3-bit length, 11-bit offset 10 Local copy, 6-bit length, 12-bit offset 11 Reference copy, 12-bit length, 12-bit offset Table I 3037677 [0045] The scan and scan engine concordance 109 includes the functionality of performing hash table consultation procedures for performing hash value comparisons.  The scan and match engine 109 includes the transmit and / or receive function of the hash table module 111 for performing computer-implemented lookup procedures to compare the values hashes calculated for the subsets of data to the data reference blocks currently stored by the system 100.  The scan and match engine 109 may use the hash table lookup logic to locate the calculated hash values within the hash tables generated by the hash table module 111 and compare the data. .  The hash table module 111 may, for example, generate the reference hash table 111-1 and the compression hash table 111-2 and perform comparison operations.  As such, the scan and match engine 109 may be configured to view the hash values calculated for a subset of bytes against the data reference blocks currently stored by the system 100 in the database. the buffers generated by the buffer management controller 112, such as the reference block buffer 112-3.  In this manner, the scan and match engine 109 may conduct parallel or concurrent searches in both a reference hash table and a compression hash table created by the hash table module 111. .  When performing such consultation procedures, the scan and match engine 109 may also perform procedures for comparing a next set of bytes received by the system 100 to the stored reference data block and / or compression hash values which correspond to the data previously identified by the hash table module 111.  Referring to Figure 1D, for example, when the reference block 118 is identified by the reference block identification module 114, the hash module 111 stores in the reference hash table 111-1 a calculated hash value that corresponds to portions of the reference block 118 (for example the values of the data subsets of the reference block 118-1, 118-2, 118-3, 118-4, etc. . ) as stored in a reference block buffer.  In this manner, the system 100 can use the reference data buffer fill time to calculate and store the sliced hash function values of the reference data corresponding to the reference block 118, which improves the performance. compression and deduplication procedures performed by the system 100.  In addition, as illustrated in Figure 1D, the system 100 may also receive input data blocks 120 associated with an incoming data stream.  As such, the scan and match engine 109 may use the hash table logic 109-3 to perform parallel lookup procedures using the reference hash table 111-1 and the compression hash table 111. -2 reported 20 to identify the previously stored data sequences that are similar to the received data blocks 120.  In this way, the scan and match engine 109 may perform byte-byte comparisons using smaller subsets (eg data subset 120-1 of the data block) of data and blocks of data. reference.  If the scan and match engine 109 detects a match between an entry in the reference hash table 111 and / or the compression hash table 111-2 and the hash value calculated for the block 120, the scan and match engine 109 may then send signals to the decompression module 108 to decompress the subset of data within the reference block buffer or data input buffer. using modified compression header formats, such as changes to the earlier referenced encoding format described herein.  As a result, the decompressed output can then be stored in a buffer generated by the buffer management controller 112, such as the data output buffer 112-2.  In one embodiment, during the realization of the decompression procedures, the decompression module 108 may be configured to select one of a plurality of different sequences when the scan and match engine 109 detects a match in the table. reference hash 111-1 and / or 15 in the compression hash table 111-2.  Based on a predetermined heuristic, for example, the decompression module 108 may be configured to decompress the data in the form of figurative constants, local copies, and / or reference copies.  In this way, at decompression, the system 100 can create similar reference data input buffers so that the implementation of a decompression can be modified to interpret the previous references from either a data stream. input data, either from a reference block buffer.  [0052] As such, the decompression module 108 may be configured to process the figurative constant scan logic 109-1 and / or the local copy scan logic 109-2 used by the scan engine and concordance 109.  It should be appreciated that the embodiments of the present invention are not limited to the use of a single reference block.  Embodiments may be expanded to include multiple reference blocks with simple modifications to existing data paths and frame structures.  Embodiments may, for example, be extended to comparisons of multiple reference blocks made in parallel.  In addition, the hash table module 111 may be configured to generate multiple reference hash tables that correspond to a respective reference block of a set of different reference blocks.  In addition, multiple reference blocks may be stored within a single reference hash table generated by the hash table module 111.  In addition, the system 100 may be configured for early detection and prediction of the compressibility of the blocks prior to performing a data reduction operation such as that described herein to minimize unnecessary efforts and avoid a decrease in the overall performance of the system.  The decompression module 108 includes, for example, the functionality of performing grouping procedures on the data received by the system 100.  As such, the decompression module 108 may include data grouping logic 108-1 that allows the decompression module 108 to group the incoming data, received through the data input buffer 112-1, into 20 subsets of data bytes or "slices" that can be processed or operated in a single instance.  Thus, the hash table module 111 can calculate hash values on superimposed data slices selected by the decompression module 108 through the data grouping logic 108-1.  In addition, the hash values calculated by the hash table module 111 for the superimposed slices 3037677 can be used as memory address locations which represent the locations where the slice offset values are stored within the data structures. , such as the compression hash table 111-2 and / or the resident memory on the system 100.  In addition, the scan and match engine 109 may use the hash table module 111 to locate the calculated slices and, in parallel, perform comparison operations on the data blocks as they become available. written in the data entry buffer 112-1.  By using the compression hash table 111-2, for example, the scan and match engine 109 can detect the occurrence of a "hash access" if it determines that a hash value calculated for a Slice related to an incoming data set shares the same signature as a hash value stored in the 111-2 compression hash table.  In this manner, the scan and match engine 109 can detect the occurrence of a hash access when two slices have identical or similar signatures calculated by the computerized signature computation engine 113.  In addition, the scanning and matching engine 109 includes the signal sending functionality to the decompression module 108 to increment a compressibility counter 20, such as the hash access counter 111-3.  In this manner, the hash access counter 111-3 can be incremented each time the scan and match engine 109 detects the occurrence of a hash access.  The hash access counter 111-3 allows the system 100 to keep track of the hash values that occur frequently within an incoming data set received by the system 100.  Therefore, at the end of a data transfer in the data entry buffer 112-1, the system 100 can store a set of hashes calculated for a complete data set.  In addition, the system 100 may be configured to store frequent hash value match thresholds, which allows it to better determine which data blocks would benefit most from being subjected to reduction procedures. data (eg data deduplication procedures, reference block identification procedures, data compression procedures, etc.) ).  In this way, the system 100 may be configured in a manner that allows it to automatically interpret the compressibility characteristics using predetermined threshold values and / or calculated compressibility counts.  For example, before the system 100 performs any data reduction procedure, it can first refer to a predetermined threshold count and decide whether to perform, stop and / or suspend a data reduction operation.  Thus, the system components 100 such as the decompression module 108 may generate an instruction or set of instructions that instruct the system components 100 to initiate the execution of a data reduction operation (for example). example, data deduplication procedures, reference block identification procedures, data compression procedures, etc. ) when the threshold counter reaches or exceeds a frequent hash value match threshold.  Therefore, the system components 100 may generate an instruction or set of instructions that instruct the system components 100 to refrain from performing a data reduction operation when the threshold count fails to reach a threshold. frequent hash value concordance.  Such determinations by the system 100 not only save cycles of the host CPU, but also allow the data to move through the system without interrupting other handlers, such as host handlers.  In one embodiment, for example, if the value of the hash access counter 111-3 is less than a predetermined threshold value, the decompression module 108 may determine that the currently analyzed data blocks present low compressibility characteristics, thus demonstrating a high entropy level for at least a portion of the data stream.  Therefore, in response to this determination, the decompression module 108 may be configured not to perform any decompression operation.  In this manner, the decompression module 108 may be configured to send instructions that stop and / or suspend the execution of the decompression operations.  However, if the value of the hash access counter 111-3 is equal to or greater than the predetermined threshold value, the decompression module 108 can determine that the data blocks have high compressibility characteristics, demonstrating thus a low level of entropy for at least a portion of the data stream.  Therefore, in response to this determination, the decompression module 108 may be configured to send instructions that initialize the execution of a decompression operation.  In this manner, the decompression module 108 uses the compressibility factors to determine whether to provide "compression" or "compression bypass" signals to the other system components 100 for a given set of related bytes. with an incoming data set stored in the data input buffer 112-1.  In this manner, the system 100 can measure the entropy in relation to the data sets stored in the data entry buffer 112-1 based on the frequency of the similarities detected between the data blocks. of a given data set.  In accordance with one embodiment, the scan and match engine 109 may calculate the frequency of the hash accesses using representations of the data as a histogram.  In addition, the hash access counter 111-3 can be implemented by hardware or software.  In addition, the system 100 may also be configured to dynamically adjust the threshold values based on system load and / or user preferences.  In this way, the threshold for compression can be relaxed to increase the compression ratio at the expense of power and latency.  Likewise, higher threshold values can be used to achieve lower average latencies.  Figure 2A is a process diagram of a first portion of an example of a one-pass entropy detection process in accordance with embodiments of the present invention.  In step 205, an input data stream is received by the system and stored in a data input buffer.  Upon receipt of the data stream, the decompression module uses the data grouping logic to group a plurality of subsets of data found within the input data stream.  The size of the subsets may be predetermined and have a fixed value.  In step 206, using the fingerprint data generated by the computerized fingerprint calculation engine 30 for data stored in the data input buffer, the computerized signature calculation engine calculates a first signature for a first grouped subset of data within the data stream as stored during step 205.  In step 207, the hash table module calculates a first hash value for the first grouped subset of data and compares the calculated hash value with a hash value stored in the hash table so that detect a match.  In step 208, the hash table module calculates a second hash value for a second grouped data subset and compares the calculated hash value with a hash value stored in a hash table so that the hash module calculates a second hash value for a second grouped data subset. detect a match.  In step 209, the hash table module calculates a hash value for a grouped data subset and compares the calculated hash value with a hash value stored in a storage table. hash to detect a match.  In step 210, the decompression module monitors the matches detected by the hash table module and increments a counter accordingly for each detected match.  Figure 2B is a process diagram of a second portion of an exemplary one-pass entropy detection process in accordance with embodiments of the present invention.  The details of operation 210 (see Figure 2A) are depicted in Figure 2B.  In step 211, the decompression module determines an entropy level for a portion of the input data stream based on a value of the counter with respect to a hash value match threshold. frequent predetermined.  In step 212, the decompression module determines whether it detects that the hash value matching threshold has been reached or exceeded.  If the decompression module detects that the hash value matching threshold has been reached or exceeded, the decompression module determines a high entropy level for a portion of the input data stream and therefore communicates signals. to the system components for initiating the execution of the data reduction operations, as described in detail in step 213.  If the decompression module detects that the hash value matching threshold has not been reached, the decompression module determines a low entropy level for a portion of the input data stream and therefore communicates signals to the system components to stop the execution of the data reduction operations, as described in detail in step 214.  In step 213, the decompression module detects that the hash value matching threshold has been reached or exceeded and, therefore, the decompression module determines a high entropy level for a portion of the input data stream and thereby communicate signals to the system components to initiate the execution of the data reduction operations.  In step 214, the decompression module detects that the hash value matching threshold has not been reached and, therefore, the decompression module determines a low entropy level for a portion of the input data stream and therefore communicates signals to the system components to stop the execution of the data reduction operations.  Figure 3A is a process diagram of an example of a simultaneous data deduplication and compression process in accordance with embodiments of the present invention.  The details of operation 213 (see Figure 2B) are described in Figure 3A.  In step 215, the reference block identification module compares a signature computed during step 206 with the signature clusters currently stored by the system and accordingly selects a reference block that best represents the signature. calculated signature.  The reference block selected by the reference block identification module is stored in the reference block buffer for further processing by the system.  In step 216, as the reference block is stored in step 215, the hash table module calculates the sliced hash values corresponding to the reference block.  In step 217, the hash values calculated during step 216 are stored in a reference hash table generated by the hash table module, provided that the hash values are not already stored. in the reference hash table.  In step 218, with the proviso that at least one reference block is stored in the reference block buffer, the hash table module sends signals to the crawler and match to perform procedures. for consulting the hash and / or header modification table by using the reference hash table and / or the compression hash table for further processing based on the calculated hash value during steps 207, 208 and / or 209.  Figure 3B is a process diagram of an exemplary process for performing the hash table lookup procedures in accordance with embodiments of the present invention.  The details of operation 218 (see Figure 3A) are depicted in Figure 3B.  In step 219, the scan and match engine determines whether or not it has detected a match between a calculated hash value and an entry stored exclusively in the reference hash table.  If the scan and match engine determines that a match has been detected, the scan and match engine then compares byte by byte the subset of data associated with the hash value of the reference block stored in the buffer. of reference blocks associated with the matching entry, as described in detail in step 220.  If the scan and match engine determines that no match has been detected, the scan and match engine then determines whether or not it has detected a match between a calculated hash value and an entry stored exclusively in the match. compression hash table, as described in detail in step 221.  In step 220, the scan and match engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the associated data subset. to the hash value at the reference block stored in the reference block buffer associated with the matching entry and accordingly sends signals to the decompression module to decompress the subset of data within the reference buffer. reference blocks using a modified compression header format for reference copies, such as "11".  The uncompressed output is stored in the output data buffer.  In step 221, the scanning and matching engine has determined that no match has been detected and, therefore, the scan and match engine determines whether or not it has detected a match. between a calculated hash value and an entry stored exclusively in the compression hash table.  If the scan and match engine determines that a match has been detected, the scan and match engine then compares, byte byte, the subset of data associated with the hash value with the data currently stored in the cache. data input buffer, as described in detail in step 222.  If the scan and match engine determines that no match has been detected, the scan and match engine then determines whether or not it has detected a match between a calculated hash value and a stored entry at a time. in the reference hash table and the compression hash table, as described in detail in step 223.  In step 222, the scan and match engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the associated data subset. to the hash value of the data currently stored in the data input buffer and accordingly sends signals to the decompression module to decompress the subset of data within the data input buffer using a format modified compression header for local copies, such as "01" or "10", based on the correct bit length and offset.  The decompressed output is stored in the data output buffer.  In step 223, the scanning and matching engine has determined that no match has been detected and, therefore, the scanning and matching engine determines whether or not it has detected a match between a calculated hash value and an entry stored in both the reference hash table and the compression hash table.  If the scan and match engine determines that a match has been detected, the scan and match engine then compares, byte byte, the subset of data associated with the hash value with the data currently stored in the buffer. and inputting signals to the decompression module to decompress the subset of data within the data input buffer based on the predetermined procedures.  In step 224, the scanning and matching engine has determined that a match has been detected and, therefore, the scan and match engine compares, byte byte, the subset of data. associated with the hash value to the data currently stored in the data input buffer and accordingly sends signals to the decompression module to decompress the subset of data within the data input buffer based on predetermined procedures.  In accordance with one embodiment, the predetermined procedures may include the scan engine and concordance pattern to bias its selection of decompression procedures to local matches or reference matches, depending on the length of the copy and / or some other knowledge of the data associated with the data flow.  In step 225, the scanning and matching engine has determined that no match has been detected and, therefore, the calculated hash value is stored in the compression hash table generated by the hash table module.  In step 226, the scan and match engine communicates signals to the decompression module to decompress the subset of data stored in the data input buffer using a header format. modified compression for sequences of figurative constants, such as "00".  The decompressed output is stored in the data output buffer.  Although some preferred embodiments and methods have been disclosed herein, it will be apparent from the foregoing disclosure to those skilled in the art that variations and modifications of such embodiments and methods can be made without the use of the present invention. depart from the spirit and scope of the invention.  [0089] According to one embodiment, the techniques described here can be implemented by one or more specific-use computer processing devices.  The dedicated-use computer processing devices may be in physical cabling to execute the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or programmable gate arrays (FPGAs) which are permanently programmed to execute the techniques, or may include one or more general-purpose physical processors programmed to execute the techniques in accordance with program instructions in a firmware, memory, other storage, or combination.  Such purpose-specific computing devices may also combine logic into custom physical wiring, ASICs, or FPGAs with custom programming to accomplish the techniques.  The purpose-specific computer processing devices may be database servers, storage devices, desktop computer systems, portable computer systems, portable devices, networking peripherals or any other device that incorporates a logic in physical wiring and / or programmable to implement the techniques.  In the foregoing detailed description of embodiments of the present invention, many specific details have been presented to provide a thorough understanding of the present invention.  However, one skilled in the art will recognize that the present invention can be implemented without these specific details.  In other circumstances, well known methods, procedures, components and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention.  Although a method can be described as a sequence of numbered steps for the sake of clarity, the numbering does not necessarily dictate the order of the steps.  It should be understood that certain steps can be skipped, executed in parallel or executed without it being absolutely necessary to maintain a strict sequence order.
[0007] The drawings which represent the embodiments of the invention are semi-schematic and are not to scale, and certain dimensions are particularly indicated for the sake of clarity of presentation and are shown exaggerated in the figures of the drawing. Likewise, although the views in the drawings generally represent similar orientations to facilitate the description, this representation in the Figures is for the most part arbitrary.
权利要求:
Claims (21)
[0001]
REVENDICATIONS1. An apparatus comprising: a memory unit configured to store a data stream; and a processor connected to said memory unit, said processor being configured to detect the entropy of an input data stream during a single pass, said processor being usable for grouping a plurality of subsets of data from said input data stream for calculating a first hash value corresponding to a first grouped subset of data, for detecting a match between said first hash value and a second hash value stored in a hash table, for monitoring a first hash value corresponding to a first grouped subset of data, for detecting a match between said first hash value and a second hash value stored in a hash table, for monitoring a a hash value matching frequency of said input data stream, said processor being operable to increment a counter value in response to a detection of said match and to determine a level of entropy for a portion of said data stream of said input based on said counter value against a hash value matching threshold freq uente, and for generating an instruction for either initializing the execution of a data compression operation when said counter value reaches or exceeds said frequent hash value matching threshold, or abstaining from said execution of said data compression operation when said counter value fails to reach said frequent hash value matching threshold.
[0002]
An apparatus according to claim 1, wherein said instruction to initialize said execution of said data compression operation results in an output comprising a compressed portion of said input data stream. 3037677 38
[0003]
An apparatus according to claim 1, wherein said abstain instruction from said execution of said data compression operation results in an output comprising an uncompressed portion of said input data stream.
[0004]
An apparatus according to claim 1, wherein said processor can be used to generate an instruction to suspend execution of said data compression operation when said counter value fails to reach said hash value matching threshold. common.
[0005]
An apparatus according to claim 1, wherein said processor can be used to calculate a signature for each subset of data of said plurality of subsets of data, and said match represents at least two pooled data subsets. in relation to said input data stream having the same signature.
[0006]
Apparatus according to claim 1, wherein said processor may be used to adjust said frequent hash value matching threshold based on a current system load. 20
[0007]
An apparatus according to claim 1, wherein said processor can be used to adjust said frequent hash value matching threshold based on a user's preference.
[0008]
A computer-implemented method for detecting the entropy of an input data stream during a single pass, said method comprising: receiving an input data stream; grouping a plurality of subsets of data from said input data stream; calculating a first hash value corresponding to a first grouped subset of data; detecting a match between said first hash value and a second hash value stored in a hash table and incrementing a counter value in response to detecting said match; monitoring a hash value matching frequency of said input data stream; determining an entropy level for a portion of said input data stream based on said counter value with respect to a frequent hash value matching threshold; and generating an instruction for either initiating the execution of a data compression operation when said counter value reaches or exceeds said frequent hash value matching threshold, or abstaining from said execution of said data compression operation when said counter value fails to reach said frequent hash value matching threshold.
[0009]
The computer-implemented method of claim 8, wherein said instruction to initialize said execution of said data compression operation results in an output comprising a compressed portion of said input data stream. 20
[0010]
The computer-implemented method of claim 9, wherein said abstain instruction from said execution of said data compression operation results in an output comprising an uncompressed portion of said input data stream. 25
[0011]
The computer-implemented method of claim 8, wherein said generating further comprises generating an instruction to suspend execution of said data compression operation when said counter value fails to achieve said hash value matching threshold is frequent. 5
[0012]
The computer implemented method of claim 8; wherein said grouping further comprises calculating a signature for each subset of data of said plurality of subsets of data, and said matching represents at least two grouped subsets of data related to said data stream entry with the same signature. 10
[0013]
The computer-implemented method of claim 8, further comprising: adjusting said frequent hash value matching threshold based on a current system load. 15
[0014]
The computer implemented method of claim 8, further comprising: adjusting said frequent hash value matching threshold based on a user's preference. 20
[0015]
Apparatus comprising: a memory unit configured to store a data stream; and a processor connected to said memory unit, said processor being configured to detect the entropy of an input data stream during a single pass, which processor may be used to calculate a signature for each subset of data a plurality of subsets of data from said input data stream for calculating a first hash value corresponding to a first grouped subset of data, for detecting a match between said first hash value and a second value hash stored in a hash table, for monitoring a hash value matching frequency of said input data stream, said processor being operable to increment a counter value in response to a detection of said match and for determining an entropy level for a portion of said input data stream based on said comp value to generate a command for either initializing the execution of a data reduction operation when said counter value reaches or exceeds said hash value matching threshold; frequently, either to abstain from said execution of said data reduction operation when said counter value fails to reach said frequent hash value matching threshold.
[0016]
Apparatus according to claim 15, wherein said instruction to initialize said execution of said data reduction operation results in an output comprising a compressed portion of said input data stream. 15
[0017]
An apparatus according to claim 15, wherein said abstain instruction from said execution of said data reduction operation results in an output comprising an uncompressed portion of said input data stream.
[0018]
Apparatus according to claim 15, wherein said data reduction operation is a data deduplication operation.
[0019]
The apparatus of claim 15, wherein said data reduction operation is a data compression operation.
[0020]
Apparatus according to claim 15, wherein said processor can be used to adjust said frequent hash value matching threshold based on a current system load.
[0021]
Apparatus according to claim 15, wherein said processor may be used to adjust said frequent hash value matching threshold based on a user's preference.
类似技术:
公开号 | 公开日 | 专利标题
FR3037677B1|2019-06-14|APPARATUS AND METHOD FOR DETECTION OF ENTROPY IN A PASS ON DATA TRANSFER
FR3037676A1|2016-12-23|APPARATUS AND METHOD FOR COMPRESSION AND DEDUPLICATION ONLINE
US9798731B2|2017-10-24|Delta compression of probabilistically clustered chunks of data
US20130321180A1|2013-12-05|Method of accelerating dynamic huffman decompaction within the inflate algorithm
WO2021027252A1|2021-02-18|Data storage method and apparatus in block chain-type account book, and device
EP3959643A1|2022-03-02|Property grouping for change detection in distributed storage systems
US20150227540A1|2015-08-13|System and method for content-aware data compression
US8750562B1|2014-06-10|Systems and methods for facilitating combined multiple fingerprinters for media
JP2010277522A|2010-12-09|Device for constructing locality sensitive hashing, similar neighborhood search processor, and program
US11119995B2|2021-09-14|Systems and methods for sketch computation
US10938961B1|2021-03-02|Systems and methods for data deduplication by generating similarity metrics using sketch computation
US20210191640A1|2021-06-24|Systems and methods for data segment processing
US10922187B2|2021-02-16|Data redirector for scale out
US20190236283A1|2019-08-01|Data analysis in streaming data
US10833702B1|2020-11-10|Interpolation search to find arbitrary offsets in a compressed stream
Zhuang et al.2015|StoreSim: Optimizing information leakage in multicloud storage services
WO2021127245A1|2021-06-24|Systems and methods for sketch computation
US8880899B1|2014-11-04|Systems and methods for facilitating flip-resistant media fingerprinting
同族专利:
公开号 | 公开日
KR20190014033A|2019-02-11|
FR3037677B1|2019-06-14|
DE102016007364A1|2016-12-22|
KR101945026B1|2019-02-08|
KR20160150029A|2016-12-28|
US20170097960A1|2017-04-06|
CN106257402A|2016-12-28|
KR20200024193A|2020-03-06|
CN106257402B|2021-05-25|
CA2933370C|2021-02-16|
GB2542453B|2017-12-06|
GB2542453A|2017-03-22|
CA2933370A1|2016-12-19|
GB201610255D0|2016-07-27|
AU2016203418A1|2016-06-16|
KR102261811B1|2021-06-04|
US10089360B2|2018-10-02|
US20160371267A1|2016-12-22|
AU2015215975B1|2016-03-17|
JP2017011703A|2017-01-12|
US9552384B2|2017-01-24|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5032987A|1988-08-04|1991-07-16|Digital Equipment Corporation|System with a plurality of hash tables each using different adaptive hashing functions|
US5406279A|1992-09-02|1995-04-11|Cirrus Logic, Inc.|General purpose, hash-based technique for single-pass lossless data compression|
JP2564749B2|1993-03-31|1996-12-18|株式会社富士通ソーシアルサイエンスラボラトリ|Data compression method|
US6278735B1|1998-03-19|2001-08-21|International Business Machines Corporation|Real-time single pass variable bit rate control strategy and encoder|
US6374266B1|1998-07-28|2002-04-16|Ralph Shnelvar|Method and apparatus for storing information in a data processing system|
US6624761B2|1998-12-11|2003-09-23|Realtime Data, Llc|Content independent data compression method and system|
US6195024B1|1998-12-11|2001-02-27|Realtime Data, Llc|Content independent data compression method and system|
US6601104B1|1999-03-11|2003-07-29|Realtime Data Llc|System and methods for accelerated data storage and retrieval|
CN1437738A|2000-01-03|2003-08-20|埃菲克塔技术股份有限公司|Efficient and lossless conversion of data transmission and storage|
US6724817B1|2000-06-05|2004-04-20|Amphion Semiconductor Limited|Adaptive image data compression|
US7289643B2|2000-12-21|2007-10-30|Digimarc Corporation|Method, apparatus and programs for generating and utilizing content signatures|
DE10147755B4|2001-09-27|2004-06-17|Siemens Ag|Methods and devices for header compression in packet-oriented networks|
TWI220959B|2003-06-05|2004-09-11|Carry Computer Eng Co Ltd|Storage device with optimized compression management mechanism|
KR100626719B1|2004-09-01|2006-09-21|주식회사우진넷|Method of transmitting compressed data packet using optimized header|
US7487169B2|2004-11-24|2009-02-03|International Business Machines Corporation|Method for finding the longest common subsequences between files with applications to differential compression|
US8484427B1|2006-06-28|2013-07-09|Acronis International Gmbh|System and method for efficient backup using hashes|
US7885988B2|2006-08-24|2011-02-08|Dell Products L.P.|Methods and apparatus for reducing storage size|
US7970216B2|2006-08-24|2011-06-28|Dell Products L.P.|Methods and apparatus for reducing storage size|
CN101821973B|2007-06-25|2014-03-12|熵敏通讯公司|Multi-format stream re-multiplexer for multi-pass, multi-stream, multiplexed transport stream processing|
KR101503829B1|2007-09-07|2015-03-18|삼성전자주식회사|Device and method for compressing data|
US8819288B2|2007-09-14|2014-08-26|Microsoft Corporation|Optimized data stream compression using data-dependent chunking|
US7937371B2|2008-03-14|2011-05-03|International Business Machines Corporation|Ordering compression and deduplication of data|
JP2010061518A|2008-09-05|2010-03-18|Nec Corp|Apparatus and method for storing data and program|
US8751462B2|2008-11-14|2014-06-10|Emc Corporation|Delta compression after identity deduplication|
US8205065B2|2009-03-30|2012-06-19|Exar Corporation|System and method for data deduplication|
US8706727B2|2009-06-19|2014-04-22|Sybase, Inc.|Data compression for reducing storage requirements in a database system|
US9058298B2|2009-07-16|2015-06-16|International Business Machines Corporation|Integrated approach for deduplicating data in a distributed environment that involves a source and a target|
GB2472072B|2009-07-24|2013-10-16|Hewlett Packard Development Co|Deduplication of encoded data|
US20110093439A1|2009-10-16|2011-04-21|Fanglu Guo|De-duplication Storage System with Multiple Indices for Efficient File Storage|
US8364929B2|2009-10-23|2013-01-29|Seagate Technology Llc|Enabling spanning for a storage device|
US8407193B2|2010-01-27|2013-03-26|International Business Machines Corporation|Data deduplication for streaming sequential data storage applications|
US8427346B2|2010-04-13|2013-04-23|Empire Technology Development Llc|Adaptive compression|
US9613142B2|2010-04-26|2017-04-04|Flash Networks Ltd|Method and system for providing the download of transcoded files|
US8533550B2|2010-06-29|2013-09-10|Intel Corporation|Method and system to improve the performance and/or reliability of a solid-state drive|
KR101725223B1|2011-03-25|2017-04-11|삼성전자 주식회사|Data compressing method of storage device|
US8725933B2|2011-07-01|2014-05-13|Intel Corporation|Method to detect uncompressible data in mass storage device|
US9363339B2|2011-07-12|2016-06-07|Hughes Network Systems, Llc|Staged data compression, including block level long range compression, for data streams in a communications system|
KR20130048595A|2011-11-02|2013-05-10|삼성전자주식회사|Apparatus and method for filtering duplication data in restricted resource environment|
US8542135B2|2011-11-24|2013-09-24|International Business Machines Corporation|Compression algorithm incorporating automatic generation of a bank of predefined huffman dictionaries|
US9047304B2|2011-11-28|2015-06-02|International Business Machines Corporation|Optimization of fingerprint-based deduplication|
US9703796B2|2011-12-06|2017-07-11|Brocade Communications Systems, Inc.|Shared dictionary between devices|
KR101862341B1|2012-01-09|2018-05-30|삼성전자주식회사|Data storage device with data compression function|
KR20130081526A|2012-01-09|2013-07-17|삼성전자주식회사|Storage device, electronic device having the same, and data management methods thereof|
US8650163B1|2012-08-20|2014-02-11|International Business Machines Corporation|Estimation of data reduction rate in a data storage system|
JPWO2014030252A1|2012-08-24|2016-07-28|株式会社日立製作所|Storage apparatus and data management method|
US9087187B1|2012-10-08|2015-07-21|Amazon Technologies, Inc.|Unique credentials verification|
US9035809B2|2012-10-15|2015-05-19|Seagate Technology Llc|Optimizing compression engine throughput via run pre-processing|
US9495552B2|2012-12-31|2016-11-15|Microsoft Technology Licensing, Llc|Integrated data deduplication and encryption|
JPWO2014125582A1|2013-02-13|2017-02-02|株式会社日立製作所|Storage apparatus and data management method|
US20140244604A1|2013-02-28|2014-08-28|Microsoft Corporation|Predicting data compressibility using data entropy estimation|
US8751763B1|2013-03-13|2014-06-10|Nimbus Data Systems, Inc.|Low-overhead deduplication within a block-based data storage|
US9471500B2|2013-04-12|2016-10-18|Nec Corporation|Bucketized multi-index low-memory data structures|
CN104123309B|2013-04-28|2017-08-25|国际商业机器公司|Method and system for data management|
CN103236847B|2013-05-06|2016-04-27|西安电子科技大学|Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding|
US9384204B2|2013-05-22|2016-07-05|Amazon Technologies, Inc.|Efficient data compression and analysis as a service|
US9710166B2|2015-04-16|2017-07-18|Western Digital Technologies, Inc.|Systems and methods for predicting compressibility of data|
US10152389B2|2015-06-19|2018-12-11|Western Digital Technologies, Inc.|Apparatus and method for inline compression and deduplication|US10152389B2|2015-06-19|2018-12-11|Western Digital Technologies, Inc.|Apparatus and method for inline compression and deduplication|
DE102015112143B4|2015-07-24|2017-04-06|Infineon Technologies Ag|A method of determining an integrity of an execution of a code fragment and a method of providing an abstract representation of a program code|
US9971850B2|2015-12-29|2018-05-15|International Business Machines Corporation|Hash table structures|
US10275376B2|2016-03-02|2019-04-30|Western Digital Technologies, Inc.|Efficient cross device redundancy implementation on high performance direct attached non-volatile storage with data reduction|
US10831370B1|2016-12-30|2020-11-10|EMC IP Holding Company LLC|Deduplicated and compressed non-volatile memory cache|
US10282127B2|2017-04-20|2019-05-07|Western Digital Technologies, Inc.|Managing data in a storage system|
US10809928B2|2017-06-02|2020-10-20|Western Digital Technologies, Inc.|Efficient data deduplication leveraging sequential chunks or auxiliary databases|
US10503608B2|2017-07-24|2019-12-10|Western Digital Technologies, Inc.|Efficient management of reference blocks used in data deduplication|
US10528600B1|2017-09-13|2020-01-07|Hrl Laboratories, Llc|System to identify unknown communication behavior relationships from time series|
US11195107B1|2017-09-13|2021-12-07|Hrl Laboratories, Llc|Method of malicious social activity prediction using spatial-temporal social network data|
GB2568165A|2017-10-18|2019-05-08|Frank Donnelly Stephen|Entropy and value based packet truncation|
US10733290B2|2017-10-26|2020-08-04|Western Digital Technologies, Inc.|Device-based anti-malware|
CN108121504B|2017-11-16|2021-01-29|成都华为技术有限公司|Data deleting method and device|
CN110572160A|2019-08-01|2019-12-13|浙江大学|Compression method for decoding module code of instruction set simulator|
US11082168B1|2020-03-19|2021-08-03|Western Digital Technologies, Inc.|Entropy driven endurance for normalized quality of service|
法律状态:
2017-05-11| PLFP| Fee payment|Year of fee payment: 2 |
2018-05-11| PLFP| Fee payment|Year of fee payment: 3 |
2018-11-02| PLSC| Search report ready|Effective date: 20181102 |
2020-05-12| PLFP| Fee payment|Year of fee payment: 5 |
2021-01-15| TP| Transmission of property|Owner name: WESTERN DIGITAL TECHNOLOGIES, INC, US Effective date: 20201209 |
2021-05-13| PLFP| Fee payment|Year of fee payment: 6 |
优先权:
申请号 | 申请日 | 专利标题
US14/744,947|US9552384B2|2015-06-19|2015-06-19|Apparatus and method for single pass entropy detection on data transfer|
US14744947|2015-06-19|
[返回顶部]